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Abstract 

In this paper we apply techniques of complex network analysis to data sources 
representing public funding programs and discuss the importance of the 
considered indicators for program evaluation. Starting from the Open Data 
repository of the 2007-2013 Italian Program Programma Operative* Nazionale 
“Ricerca e Competitivita" (PON R&C), we build a set of data models and 
perform network analysis over them. We discuss the obtained experimental results 
outlining interesting new perspectives that emerge from the application of the 
proposed methods to the socio-economical evaluation of funded programs. 
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1 Introduction 

Since the last years of the past century, the importance of basing policies on ev¬ 
idence, data, and analysis has quickly spread all over the world. The Evidence- 
Based Policy movement [1, 2, 3, 4, 5] has grown enormously, and mainly all pub¬ 
lic administrations are now focused on maximising utility and show a pragmatic 
problem-solving approach to socio-economical issues [6]. In this respect, the eval¬ 
uation of public funding programs is a field of great interest for policymakers and 
economists. Politicians and technicians need to estimate the impact that funding 
has on life and society, in order to address future programs and to modify their deci¬ 
sions. Many standard and advanced statistical methods are commonly used for this 
purpose, such as linear/nonlinear regressions, Bayesian inference, machine learning, 
data mining, and so on. In this paper we suggest new indicators, coming from net¬ 
work analysis, that can help underlining in a quantitative way important effects 
that are not usually considered, being them outside the domain of investigation of 
standard statistical tools. This does certainly not mean that program evaluation 
cannot be performed without including network analysis, but that valuable insight 
about public funding programs could hopefully be inferred from such techniques, 
in order to help increasing objectivity of the extracted results. Recently, a growing 
interest towards complex network analysis applied to evaluation can be seen both in 
literature [7, 8, 9, 10, 11, 12] and institutional reports [13]. The indicators we suggest 
can be used by experts in program evaluation for their analyses, giving them the 
opportunity of considering and quantitatively measuring important features of the 
funding programs, such as relations between the actors involved in them. Social net¬ 
work analysis is a particularly suitable tool to extract information about relations 
among the different components of a system. Investigating the relations between the 
actors participating to a program could be of interest, since can e.g. show structural 
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contradictions in the organisation of the different levels involved [10]. Considering 
the set of projects, research institutions and enterprises that participate to a fund¬ 
ing program as a complex dynamical system, it is possible to identify underlying 
network structures simply defining the edges according to some relations among the 
components that are of interest for the evaluator. Once the network is constructed, 
global and local properties can be evaluated and discussed. 

From a data collection perspective, the proposed analysis can profit from current 
emerging technologies and precise guidelines of European governmental institutions 
to support initiatives such as Smart Cities & Communities M and their co-related 
action goals ( Urban and Citizen App , e-Government, e-Democracy and so on). All 
this initiatives have produced a large number of freely available datasets containing 
information, collected by national governments, which third parties are encouraged 
to use for their scope, analyse and republish as they wish, without restrictions from 
any copyright. Recently, Open Government Data (OGD) is emerging as a major 
movement in knowledge sharing. It promotes transparency and accountability, en¬ 
ables collaboration among stakeholders, encourages novel socio-economic activities 
and growing of the so-called network economy. Starting from the idea that without 
sharing information it is not possible to establish a culture of collaboration and 
participation among the relevant stakeholders, the Linked Open Data (LOD) [14] 
Movement, which provides existing data in a machine-readable format, has gained 
large importance over the last years. From a such perspective, LOD facilitates inno¬ 
vation and knowledge creation from interlinked data, but it also introduces a level of 
complexity for information management and integration. Considering a good trade¬ 
off between data expressiveness and computational cost for data analysis, we have 
selected only Open Data repository without linked data and RDF^ triples. Despite 
the main aim of such movement of reaching the largest possible portion of users, our 
investigation has outlined that such datasets are usually of heterogeneous quality 
and size, and that their analysis requires efforts in a pre-processing phase composed 
of typical ETL (Extract-Transform-Load) [15] and data cleaning procedures. It is 
worth mentioning that problems are commonly encountered while using network 
analysis for evaluation, like the concern about anonymity of non-aggregated data 
(and eventual anonymisation), or the fact that making results public usually in¬ 
terferes with the structure of the network itself [9]. These kind of problems are 
mitigated by using Open Data, since they are public “by construction”. 

The paper is organised as follows: in Sec. 2 we introduce the steps composing the 
schema of the overall analysis process. In Sec. 3 we describe the structure of the open 
data repository of the 2007-2013 Italian Program Programma Operativo Nazionale 
u Ricerca e Competitivita” (PON R&C), in order to keep the paper self-contained, 
and introduce the data model for network analysis; in Sec. 4 we present features 
and properties of the analysed network. In order to better discuss the experimental 
results, we distinguish among local properties, global properties and community 
structure. Conclusions and perspectives close the paper. 

2 Methodology 

The approach followed in this paper consists of two main steps shown in Fig. 1: 

[1] http: //ec. europa. eu/e ip/smart cities/ 

^Resource Description Framework - http://www.w3.Org/standards/techs/rdf#w3c_all 
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Figure 1 Data-flow schema of the overall analysis process 


• processing of data sources (grey blocks) 

• finding analysis models and metrics (blue blocks) 

The first blocks of Fig. 1 involve the transformation of a general purpose dataset 
to an analysis-specific one through the implementation of a data model. The newly 
obtained dataset is then used for network analysis. The graph is constructed iden¬ 
tifying the nodes and the properties that define if two nodes are linked or not (in 
our case: being partner within the same project). Global and local properties are 
extracted, in order to produce a qualitative and quantitative description of the 
structure of the network of relationships generated by the program under examina¬ 
tion. As represented in Fig. 1, the overall process of our studies ends when a report 
summarising the analysis and its outcomes is produced. Descriptions of the numeri¬ 
cal outcomes in terms of social/economical effects are given, in order to provide the 
evaluator with a useful tool for her/his purposes. We report on the SQL-based [16] 
modelling approach, which allows to translate a given dataset in Open Data format 
into the reference set of analysis models (relational tables), and on the selected 
metrics relevant for executing an effective network analysis. It is worth underlining 
that we have adopted well-known relational tables to store data in order to ensure 
integrity of our knowledge base and provide significant results. Our current imple¬ 
mentation of data models exploits the open source object-oriented PostgreSQL 9.3 
DBMS. For network analysis we have used the Wolfram Mathematica software and 
the R programming language. 
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3 PON R&C: From datasets to data models 

In this Section, the data-driven steps shown in blocks 2-6 of Fig. 1 are de¬ 
scribed. As mentioned above, we have selected Open Data about the PON R&C 
funding program, publicly available at URL http://www.ponrec.it/open-data/. 
The program, funded with European Structural Funds managed by Ministero 
deiristruzione, Universita e Ricerca (MIUR, Ministry of Instruction, University 
and Research) and Ministero dello Sviluppo Economico (MiSE, Ministry of Eco¬ 
nomic Development), involved four underprivileged regions in Southern Italy: Apu¬ 
lia, Campania, Calabria and Sicily. The main aim of the program consists in pro¬ 
moting socio-economic growth by supporting research and innovation activities, 
improving quality of life for citizens and competitiveness of small-medium enter¬ 
prise (SME). The main features of PON R&C can be summarised as follows: 2962 
funded projects, for over 3 billion euro, 11 action programs and 8 action areas: 
Health-care , Nutrition , Energy , Environment & Ecology , Transportation & Logis¬ 
tics , Cultural Heritage & Activities , Smart Cities , Social Innovation. The group of 
all the partners involved in each funded project is called Temporary Scope Associa¬ 
tion (TSA) and plays a fundamental role for our network analysis. The downloaded 
repository has 3 LOD stars^ [17], is updated at ‘2014-06-17’, and is composed of 3 
datasets (files): 

• Projects - 10104 tuples with 52 attributes describing project information 
about program references, activities, textual description of project scope and 
objectives, details about partners and so on; 

• Locations - 11390 tuples with 8 attributes describing details about geograph¬ 
ical localisation of project partners; 

• Budgets - 5670 tuples with 13 attributes describing details about amount 
and state of project funding 

and one metadata file describing structure and meaning of each the previous files, 
according to the Open Data standard. Tab. 1 shows a sketch of Projects , Locations 
and Budgets files, representing information useful for the following discussion in form 
of couples “attribute/value”. It is important to underline that the approach taken 
here is somehow different from the ones usually adopted when performing network 
analysis in other fields. We have adopted intensive database techniques, while, for 
example, the treatment of authors of scientific papers with the same name in a social 
network of scientific collaboration is done automatically by computer algorithms, 
and errors like the correct identification of the same author represented by two 
different names {e.g. J. Smith and John Smith) are not solved, but just treated 
statistically [18, 19]. We think that, while this is perfectly reasonable in that case, 
when studying a productive system like the one we are interested in, all the actors 
must be correctly identified, and in general errors like these should be reduced to 
the minimum, in order for the analysis to be reliable and really usable by policy 
makers. 

In order to accomplish this task, we have defined a proper database model able 
to store all the projects data. First, we have created one table for each of the pre¬ 
vious datasets, exploiting the following keys to link tables: UPC attribute is the 

^The path from Open Data to Linked Open Data was firstly introduced by Sir Tim Berners-Lee 
in the 5 Stars Model at the Gov 2.0 Expo in Washington DC in 2010, where costs and benefits for 
both publishers and consumers of LOD are explained. 
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Project 

UPC 

title 

smart cities 

social innovation 


healthcare 

FC 

name 

PON04a2 A 

PRISMA 

1 

0 


1 

84001850589 

INFN 

PON04a2 A 

PRISMA 

1 

0 


1 

84001850589 

INFN 

PON04a2 A 

PRISMA 

1 

0 


1 

80002170720 

UNIBA 










Location 

UPC 

FC 

name 

kind 

region 


PON04a2 A 

84001850589 

I.N.F.N. 

PRI 

Apulia 


PON04a2 A 

80002170720 

UNIBA 

University 

Apulia 









Budget 

UPC 

FC 

name 

total_cost 

total_funded 


PON04a2 A 

84001850589 

INFN - Apulia 

2231915.7 

1785532.57 


PON04a2 A 

80002170720 

University of Bari 

2052539 

1642031.2 









Table 1 Sketch of the structure of original files from PON R&C. Example rows report some features 
of project titled PRISMA (UPC = PO iV04a2_A). Note how: 1) /A/FA/ appears three times with three 
different values of the name attribute: INFN, I.N.F.N. and INFN - Apulia ; 2) in Project table INFN 
has two duplicate rows. 


unique identifier of projects (Unified Project Code) and FC is the unique identifier 
of partners (Fiscal Code). In order to improve dataset quality we have solved tex¬ 
tual description encoding and numerical value format. Moreover, we have overcome 
name mismatching by inserting a unique label for each partner. Such label repre¬ 
sents a convenient choice among the multiple names associated to the same fiscal 
code (FC) in the original datases^ (e.g., between “I.N.F.N.” and “INFN - Apu¬ 
lia” associated to the same FC “84001850589”, we have chosen “INFN” as label). 
We underline that this data cleaning step is a key aspect in evaluations based on 
network analysis, in which results are sensitive to lacking data, and it is not possible 
to sample the population to extract useful information [20]. We are aware of the 
fact that such a problem could be much more evident in very large databases (i.e. 
the ones containing millions or more tuples, rather than thousands, like in the case 
under examination here), and we think the only viable solution is pushing insti¬ 
tutions towards producing better open data. After the procedure described above, 
we have obtained a normalised database designing a many-to-many relation among 
Project and Partner tables and deleting duplicated and bad-formed tuples. Ex¬ 
ploiting SQL standard queries, from our database we have selected 300 projects 
with at least 2 partners (thus suitable for network analysis). Those involve 769 dis¬ 
tinct partners, for a total cost of the projects of 2500 Memo (around 78% of the 
total cost of PON R&C projects), divided into Universities (33), Public Research 
Institutes (21), non-Public Research Institutes (44), Micro Enterprises (203), Small 
Enterprises (232), Medium Enterprises (58), Large Enterprises (163). It is signif¬ 
icant that ^ 10% of the total number of funded projects, the ones involving a 
network of relations, represents ^ 78% of the total budget. This is an indication 
of the importance of relations in the Italian productive system. We note that 24 
partners, we name N.C. partners, are not classified (in the original datasets) and 
that the Social Innovation action area does not include any project. In Fig. 2, the 
distribution of fundings for the selected 300 projects is shown. Fig. 2 (part (a)) 


^see e.g. the name attribute in Tab. 1 
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■ Large Enterprise 

■ no-Public Research Institute 

■ N.C. 

■ Small Enterprise 

■ Public Research Institute 

□ University 

□ Micro Enterprise 

□ Medium Enterprise 



■ Smart Cities 

■ Cultural Heritage & Activities 

■ Transportation & Logistic 

■ Environment 

■ Energy 

■ Nutrition 
Healthcare 

■ N.C. 


Figure 2 Distribution of fundings for 300 projects within the PON R&C program. Part (a) 
represents cost distribution among different kinds of partners, expressed in percentage 
w.r.t. the total cost; part (b) shows the cost distribution among the action areas 
mentioned above (exception made for the empty Social Innovation one). 



Figure 3 Distribution of fundings for each kind of partner, for each action area. 


represents cost distribution among different kinds of partners, expressed in percent¬ 
age w.r.t. the cost of all selected projects, divided for kind of partners and Fig. 2 
(part (b)) the cost distribution among action areas above mentioned (except the 
empty Social Innovation area). As a general overview, Fig. 3 shows the distribution 
of fundings for each kind of partner, for each action area. In the calculation of such 
cost distribution, we have considered the attribute total_cost, among different 
ones concerning budgets, since it is the only one having a NOT NULL value for each 
tuple in the original dataset. Starting from concepts underlying the TSA definition, 
we have built a set of SQL stored procedures performed on our database. Inputs are 
tables and attributes representing significant elements for the networks construction 
and outputs are the ad-hoc generated views with aggregated and derived data. The 
main generated data views are the following: 
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• Partner-to-Partner - the distinct couples of partners involved in the same 
project; 

• Project-to-Project - the distinct couples of projects having at least one 
partner in common, together with the calculated number of shared distinct 
partners; 

• Project-to-Partner - the set of distinct involved partners for each project; 

• Partner-to-Funding - the funding, for each beneficiary, calculated consid¬ 
ering all the PON R&C projects it is involved in; 

• Beneficiary-to-Beneficiary - the distinct couples of beneficiaries having at 
least one project in common, together with the calculated number of shared 
distinct projects. 

4 Network Analysis 

This Section contains a description of the activities described by points 7-8 of Fig. 1. 
The network analysed here is an affiliation network [18, 19], constructed in such a 
way that every university, research institution or enterprise that has been funded 
by the program is a vertex of the graph and there is an edge between two vertices if 
the corresponding participants are part of at least one TSA, for at least one funded 
project. In this way, the graph is the union of complete, undirected, unweighted 
graphs, each representing a TSA, in which every node is connected to all the others. 
The network structure is due to vertices participating to more than one projects, 
in more than one TSA. In our analysis we have not considered vertices that have 
been funded without participating in any TSA. The resulting network is shown in 
Fig. 4; it has 769 vertices and 4868 edges, and is not connected. It is made of one 
giant component, composed of 744 vertices and 4845 edges, and 10 small complete 
graphs of order 5 (one graph), 3 (two graphs) and 2 (seven graphs). The graph 
has been analysed, and several properties [21, 22] have been extracted to support 
the evaluation of the PON R&C public funding program. Such properties belong 
to two main classes: local and global ones. Local properties are features of single 
vertices or edges, and, in particular, centrality coefficients are evaluated, in order 
to understand the importance of the nodes within the network. Global properties 
involve the network as a whole instead, and are used to describe the full program, 
independently of the single nodes. 

4.1 Local properties 

Properties of vertices evaluated in the present analysis include degree centrality , 
betweenness centrality , closeness centrality , eccentricity , eigenvector centrality , radi- 
ality centrality and PageRank centrality , based on the Google PageRank algorithm 

[23]. 

The highest values of all centralities is found in correspondence to public research 
institutions, like universities and specific research centres. In particular, the Italian 
National Research Centre (CNR) shows the best values for all the indicators. It 
is worth saying that it is a peculiar vertex of the network, being composed of 104 
institutes spread over geographically distributed sites (in all the biggest cities in 
Italy), and covering a large spectrum of activities in many fields, from pure research, 
to applied disciplines. Probably, it would be better to split such vertex and consider 
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each site, or department, separately, but the dataset does not contain such details. 
On the contrary, dividing the CNR in many entities would in a certain sense spoil 
its central nature in the Italian panorama. Resolving this controversy is interesting, 
but is over the purposes of the present paper, and is left for a future work, when 
more detailed Open Data will be available. Apart from cases like the one described, 
the central role of public research institution for the network structure is clear from 
all the centralities. 

Degree centralities are discussed in detail in the next Sec. 4.2, since global proper¬ 
ties of the network can be inferred from the distribution of such quantities, despite 
being them local in nature. 

Betweenness [24] measures the importance of a node for traffic of information 
across the network. Large betweenness centrality of a vertex indicates that many 
shortest paths between couples of other vertices pass through that node. The rele¬ 
vance of this quantity for program evaluation stands in the possibility of assessing 
the role of institutions/enterprises for the eventual aggregation of “far” nodes. For 
example, a policymaker interested in promoting a program aimed at aggregating 
and consolidating the productive system of a region should pay attention not to 
spoil the edge betweenness of the network of relations between the actors involved 
in the program. 

Closeness centrality indicates whether a node is at a short average distance from 
every other reachable vertex, with higher closeness meaning shorter distance. A 
variant is radiality centrality, which gives higher weight to the neighbourhood of 
the node. From the social/economical point of view these quantities give indication 
about how easily an institution/enterprise can connect to all the other members 
of the network (and, so, of the productive system). For example, an enterprise 
with high closeness centrality could be the right promoter for initiatives like the 
creation of technological districts, associations or lobbies. Exploiting the information 
contained in this quantity, a policymaker could more easily head the productive 
system in the desired direction with focused regulatory interventions. 

Eccentricity is the maximum value of the distances between a node and any other 
node in the network. It gives an idea of how central a vertex is within the network, 
with smallest values corresponding to more central nodes. 

High eigenvector centrality is assigned to vertices that are connected to many other 
well-connected vertices. It can be used to identify the best way to spread a trend 
within the productive system represented by the network. A variant of eigenvector 
centrality is PageRank centrality, which is a way of measuring the importance of a 
node within a graph. The original algorithm was created by Larry Page and Sergey 
Brin in 1986 at Stanford University [25, 26, 27] and is widely used by Google to 
measure the importance of website pages. The algorithm used here is given by the 
solutions of: 


r = aA J H r , (1) 

where r is the vector of the PageRank centralities for each node, A is the adjacency 
matrix of the graph, H is the diagonal matrix consisting of 1/ max{l, <A}, di being 
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the degree of the i th vertex, and a is the damping factor, an empirical parameter^ 
usually set to a = 0.85 [23]. This is an example of how fruitful the application of 
social network analysis can be to the evaluation of the effects of public funding, also 
by means of well known and successful algorithms, as the PageRank one. 

In Fig. 5 the largest values of all the centralities of the (giant component of 
the) PON R&C network are reported. As stated before, the CNR has the highest 
values for all the centralities, probably due to its being scattered along the whole 
country. Nevertheless, it is important to observe that public research institutions, 
mainly universities, occupy the top positions for every centrality, while in the lowest 
positions we find private enterprises, no matter whether they are large or small 
(except for the Polytechnic of Bari, which is a public university and has a low 
value of PageRank centrality). The central role of public research institutions for 
the network of relations underlying (at least part of) the Italian productive system 
is clear from Fig. 5. Eccentricity is not reported in the figure, since a high number 
of vertices share the same value of this centrality, meaning that the network is 
somewhat “equally spaced”. For degree and betweenness centralities only the largest 
values are reported, since the smallest ones are trivial. 

The last centrality considered here is edge betweenness, which is a property of 
the edges of the network (rather than vertices), and measures how central a link 
is for the connections between nodes. It is measured by counting the number of 
shortest paths the edge belongs to, and gives a quantitative idea of how much the 
relation between two institutions/enterprises is important for the “communication” 
between all the actors composing the network. As the number of edges is much larger 
than the number of vertices, and since it is necessary to evaluate the shortest path 
between any couple of nodes, the calculation of such centrality is a resource intensive 
process. The largest values of the edge betweenness for the (giant component of the) 
PON R&C network are reported in Fig. 6 . Also in this case, the most important 
relationships (edges) between the nodes of the network are the ones between public 
research institutions, while small enterprises give small to no contribution to the 
geodesics. 

4.2 Global properties 

The first property analysed here is the degree distribution of the vertices, i.e. the 
frequencies of the degree centralities described in the previous Sec. 4.1. The impor¬ 
tance of such distribution stands in the possibility of inferring from it information 
about the topology of the graph, and in particular to understand if the network is 
scale-free [28]. The property of being scale-free is shared by many real networks, 
showing power law-shaped degree distributions P(k) = A& _7 , with exponents usu¬ 
ally varying in the range 2 < 7 < 3, which have the same form at all scales. 

This is of particular interest since power laws are commonly associated with 
second-order phase transitions in dynamical systems. Phase transitions in complex 
networks represent an interesting research field [29, 30], but the graph considered 
here is static, so no considerations can be made in this respect. Anyway, this is 

[5] representing the probability that a traveler randomly navigating the network con¬ 
tinues doing it at a given point 
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an interesting perspective for a future work, in which dynamics can be taken into 
account. 

Scale-free networks have an inhomogeneous degree distribution, with many nodes 
having more connections than the average (hubs). The hubs follow a hierarchy, in 
which large ones are connected to smaller ones, which are themselves connected 
to even smaller ones, and so on. This feature makes the network robust against 
casual failures, since the removal of a random vertex would not systematically affect 
the main hubs, and connectedness would not be spoiled. Hence, scale-free graphs 
are a desirable result for policymakers interested in generating a solid network of 
relationships between productive actors on the territory. Apart from being a strong 
point for networks, hubs also represent a weakness, since their systematic removal 
would quickly destroy the network. The property of being scale-free is an important 
point to be taken into account for an evaluator, as we will show below, in order 
to monitor and evaluate the results of funding programs. Moreover, it suggests to 
decision makers that effort should be put in promoting funding program which hubs 
can profit from. 

The degree distribution of the PON R&C network is shown in Fig. 7. The tail 
(starting from the upper bound of the median interval, mm 7) is fitted very well 
to a power-law function of the form P(k) = Ak~ 7 with A = 4.156 =b 0.375 and 
7 = 1.998±0.040. To obtain the fits, a nonlinear regression based on Newton method 
[31] has been used. A comparison with another fit, to an exponential distribution 
P(k) = Ae~ lk with A = 0.241 ±0.011 and 7 = 0.164±0.005, shows that the former 
fits the distribution slightly better than the latter, with R pow = 0.935 and R 2 exp = 
0.929. Such a small difference between the values of R 2 is not a strong indication of 
the fact that a power-law fits the distribution better than an exponential law, but 
together with the fact (shown below) that higher moments grow, it is sufficient to 
assess the power-law nature of the distribution. In fact, for a power-law distribution 
with tail of O (x~ v ) the moments of order n ^ (z/ — 1) diverge, and in general higher 
order moments are larger in size with respect to the lower order ones (this is not 
true for exponential distributions). Standing the known difficulties in evaluating 
the nature of the degree distribution, due to noise coming from the finiteness of 
the sample (especially from boundary values), the present result is satisfactory in 
assessing the property of the PON R&C network of being scale-free. More refined 
methods could be used to evaluate the parameter 7 with higher precision like e.g. 
the Kolmogorov-Smirnov test [32], but this is outside the purposes of the present 
work. 

The distribution has an expected value (k) = 12.661, mode M = 6, median 
6 < m < 7, standard deviation <7 = 23.781, skewness s = 8.876 and kurtosis 
k = 113.882 (all the moments are shown in Tab. 2). 

Once assessed such power-law nature, it is interesting to identify the main hubs. In 
the PON R&C network considered here, hubs are public research centres, and this 
represents a strong point for the relationships of the involved productive system. 
In fact, it is natural, in the lifecycle of a productive system, that some enterprises 
rise while others fall, resulting, in the language of networks, in the random removal 
of vertices described above. Anyway, as previously stated, the random removal of 
vertices from a scale-free network does not spoil connectivity, which happens with 
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Moment 

Value 

Expected value 

(k) = 12.661 

Mode 

M = 6 

Median 

6 < m < 7 

Standard deviation 

a = 23.781 

Skewness 

s = 8.876 

Kurtosis 

k = 113.882 


Table 2 Moments of the degree distribution of the PON R&C network. 


the systematical removal of the main hubs instead. In this case, it is unlikely that one 
of the main hubs, identified here with large public research centres, could disappear, 
since this would mean e.g. the closure of a large public university, a quite rare event. 
This picture was partly expected, since in many cases it was mandatory to involve 
public research institutions in the TSAs. Nevertheless it still represents a strong 
indication for a decision maker, suggesting that it is “safer” including public research 
in a future program, since it is the easiest way to keep a solid relationship network 
within the productive system. 

Another way of assessing if a network is scale-free consists in evaluating the dis¬ 
tribution of local clustering coefficients, i.e. the number of edges connecting the 
neighbours of each vertex v, divided by the number of edges of a complete graph 
of the same cardinality of the neighbourhood of v [33]. The PON R&C network 
represents a special case, in which local clustering coefficients are less important, 
the majority of them being close to 1 by construction. In fact, since the graph is a 
union of complete graphs, it is likely that the neighbourhood of a vertex is fully con¬ 
nected, implying the closeness to one of the local clustering coefficient. The global 
clustering coefficient C, i.e. the fraction of paths of length two that are closed (over 
all paths of length two), is much more significant instead, and it takes a small value 
C = 0.215 for the giant component, meaning that the network is not strongly clus¬ 
tered. From the political and sociological point of view, this is an interesting point, 
since the network is made by “scattered” relationships, despite being composed of 
“closed” TSAs. 

Other important features that can guide the policymaker in evaluating the effects 
of the program or planning future ones are vertex connectivity V c and edge con¬ 
nectivity E c , i.e. the smallest number of vertices or edges to be removed in order 
to disconnect the graph, respectively. For the case under examination such quan¬ 
tities take value V c = 1 and E c = 1, meaning that the removal of a single node 
or edge can be catastrophic for network connectivity. Identifying and monitoring 
such nodes/edges can be very important in case of low values of such parameters, 
in order to keep the network of relations tightly connected. 

Another important property of scale-free networks is that they are small world 
networks [34]. This means that relatively short paths exist between any two nodes 
(with respect to the large size of the graph), with an average shortest path length^ 


t 6 h.e. the average length of all shortest paths between couples of vertices of the graph. 
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L ~ O(logAT), N being the total number of vertices. This is due to the existence 
of links between vertices belonging to farther parts of the graph, having the role 
of connecting them and reducing distances to few hops. Usually, in scale-free net¬ 
works such vertices are the hubs and the small-world property is enhanced when 
2 < 7 < 3 where L ~ O(loglogTV) (while it is L ~ 0(\ogN) when 7 > 3) [35]. 
For the PON R&C network, 7 = 1.998 ± 0.040, L = 2.532, log TV = 6.645 and 
log log AT = 1.889, so the small world property is enhanced, as expected when the 
vertex distribution follows a power-law with 2 < 7 < 3. Again, this cannot be 
considered a smoking gun proving that the network is scale-free, but just another 
indication in addition to the ones mentioned above. 

Other global features of the network are the radius 7 Z and diameter V of the graph, 
defined as the minimum and the maximum eccentricity of all vertices, respectively, 
the eccentricity being the longest shortest path from a source node to every other 
vertex in the graph. For the PON R&C network V = 5 and 1Z = 3, meaning 
that no vertex is more than 5 hops far from any other node, and that the farthest 
destination is never closer than 3 hops from any source. From the point of view of 
program evaluation, this means that that PON R&C has been successful in creating 
(or intersecting) a network of close relationships between the funded actors. Being 
interested in promoting such a relationship network while defining the program, 
these could be good ex post indicators of the goodness of the obtained results. 

The centre of the graph^ is shown in Fig. 4. It includes public research centres 
like CNR (which is also the main hub) and ENEA, all the major Universities in¬ 
volved in the program (Bari, Calabria, Catanzaro, Foggia, Naples, Palermo, Salento, 
Salerno), private research centres like CETMA, and also some large private enter¬ 
prises like Avio S.p.A., Engeneering S.p.A., IBM, SELEX S.p.A., and EXEURA 
S.r.l.. This is a strong indication that the network of funded projects gravitates 
around large poles involving research centres (public and private), which turn out 
to have a key role in aggregating entities. This can also be an explanation for the 
scale-free property of the graph, since preferential attachment is known to be a gen¬ 
erating mechanism for this kind of networks [36, 37, 38], in which nodes prefer to 
link to vertices with high degree. It is reasonable to imagine that many small actors 
prefer forming TSAs including large research organisations, which are usually able 
to get more funds, rather than form TSAs between themselves. From the point of 
view of social networks and relationships, it is particularly interesting to study such 
feature side-by-side with assortativity [39], which indicates whether nodes of the 
graph tend to connect with their connectivity peers (vertices with similar degree) 
or not. In the first case the network is said to be assortative , while in the second 
case it is anti-assortative. This feature is quantitatively measured through the as¬ 
sortativity coefficient r, whose range is—1 ^ r ^ 1, r = 1 (—1) meaning a perfectly 
(anti-)assortative graph and r — 0 indicating no particular preference for the ma¬ 
jority of the nodes. In the present network, r = —0.173, meaning that the graph is 
slightly anti-assortative. This means that the productive system funded by such pro¬ 
gram has a little tendency not to form lobbies among important actors, but to asso¬ 
ciate strongly connected hubs to smaller and less connected enterprises/institutions. 
From the socio-economical point of view, it seems reasonable that small enterprises 


[7] i.e. the set of vertices with minimum eccentricity. 
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turn to larger ones or to big research centres to benefit from sharing and collab¬ 
orations This is an interesting result, since most social networks show assortative 
mixing by node degree [40], and it also has some implications on the topology of 
the network. First, anti-assortative networks are more susceptible to the removal 
of high-degree nodes (here represented by universities and research centres), which 
is an indication for the policymaker of the importance that public research has in 
the productive system, and of the possible disruptive effect of its underestimation. 
Second, in anti-assortative networks epidemics span to larger portions of the nodes 
than in similar assortative ones. This means that being anti-assortative is preferable 
for the spreading of knowledge and know-how in the productive system, making it 
more efficient. It is worth noting that in a recent work [12] a social network sim¬ 
ilar to the one studied here, concerning the funding of FP7 (Seventh Framework 
Programme) European research projects, has been found to be anti-assortative as 
well, and conclusions close to the ones put foward here are drawn. This could be 
an indication of some structural feature shared by graphs constructed starting from 
public funding programs, and we plan to further investigate this point in a future 
work. Lastly, link efficiency is a measure of traffic capacity within the network, 
representing how efficiently information can be transmitted along the graph. This 
parameter takes the very high value £ = 0.999 in the PON R&C network, which is 
a strong indicator of robustness for the relations between vertices, especially in a 
graph with small density p = 0.017 as the one under examination. 

The study of global properties of the PON R&C graph is given as an exam¬ 
ple showing how network analysis provides a concrete way of examining the role of 
funded actors within a program, supporting its ex post evaluation with the introduc¬ 
tion of rather innovative indicators. In particular, it is able to describe the structure 
of the productive system, highlighting the key nodes for network connectivity, or 
vertices that have a central role, through quantitative (thus evaluable) indicators, 
which for the present case are summarised in Tab. 3. 


Property 

Value 

Radius 

n = 3 

Diameter 

V = 5 

Density 

p = 0.017 

Global clustering coefficient 

C = 0.215 

Vertex connectivity 

Vc = 1 

Edge connectivity 

E c = 1 

Average shortest path length 

L = 2.532 (O(loglogiV)) 

Link efficiency 

f = 0.999 

Assortativity coefficient 

r = -0.173 


Table 3 Global properties of the PON R&C network. 
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4.3 Community structure 

The community structure of a graph is a global property, but a separate section 
is dedicated to it, since it has a special role, of particular importance for program 
evaluation. In the PON R&C network 15 communities are found with the Newman- 
Girvan algorithm [41], composed of 207, 136, 129, 113, 75, 29, 8, 7, 7, 7, 6, 6, 5, 
5, and 4 vertices, respectively. The algorithm consists in recursively removing from 
the graph the edges with the highest edge betweenness and recalculating the edge 
betweenness for the new graph obtained at each step. This procedure generates a 
dendrogram of sets of communities, from which the set with largest modularity is 
chosen. The community structure of the PON R&C network is shown in Fig. 8. The 
reason why it is so important is that it represents an unbiased way of discovering the 
existence of groups within a certain network of relationships, and highlighting such 
groups can be very important for the analysis of a productive system like the one 
described by the graph under examination. The PON R&C network shows strongly 
heterogeneous communities, with hugely populated groups and very small ones. An 
important point, that can be interesting for an ex post program evaluator, is that 
when communities grow in size, they tend to include important nodes. For example, 
the biggest community, made of 186 vertices, include the CNR in it, which shows 
the record values for all the centralities, as stated in Sec. 4.1. 

Moreover, comparing the community structure coming from network analysis with 
the one expected on the basis of external (economical, political, and social) con¬ 
siderations can enrich the evaluation by introducing a different point of view on 
the system under examination, not driven by “human” considerations, but purely 
mathematical in nature. The distribution in percentage of action areas within each 
community is shown in Fig. 9. It can be seen that the communities found with 
network analysis are not directly linked to action areas, at least the largest ones. 
Other algorithms can be used to extract the community structure of a graph, like 
the leading eigenvector [42], the multi-level modularity [43] or the spin-glass [44], 
and refined methods can be used to which one gives the most significant results, 
like e.g. a consensus analysis [45]. This kind of study is outside the purposes of the 
present paper, since it must be policy-driven, rather than research-driven, in order 
to be of interest for program evaluation. We plan to develop similar analyses in 
future work. 

5 Conclusions and perspectives 

In this paper we have used techniques borrowed from complex network analysis 
to evaluate the effects of a public funding program on the relations between the 
funded “actors”. The PON R&C program involves a large number of actors and is 
extended over a period of seven years (2007-2013). The dataset is completely made 
of Open Data, and we have shown a way of concretely using information made 
available by Governments, in the spirit promoted by current global guidelines. We 
have described the full process of knowledge management, from data acquisition, 
to cleaning, model building and querying. The whole chain is data oriented and is 
focused in retaining every piece of available information, in order for the output of 
the analysis to show the highest possible accuracy. 

The processed PON R&C data have been used for complex network analysis, 
and the resulting network has 769 vertices and 4868 edges. We have evaluated the 
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most important centralities for each node, plus some relevant global properties of 
the graph. The outcome of our analysis shows a dominant role of public (and, but 
less importantly, also private) research institutions within the Italian productive 
panorama, at least for the part portrayed by the program under examination. Uni¬ 
versities and research centres play the role of the “glue” for this particular program, 
i.e. they are responsible of the connectedness of the network, and a failure involving 
some of them would be disruptive for the whole productive system. This picture 
was partly expected, due to the way the program has been realised, since in many 
cases it was mandatory to involve public research institutions in the projects. Nev¬ 
ertheless it can be useful to use such result as an ex-post indicator. Moreover, we 
have found that the PON R&C network is anti- assort at ive, an unusual feature of 
social networks, shared only with other cases involving FP7 public program of Eu¬ 
ropean research funding, preferable than the most common assortative mixing for 
the spreading of knowledge and know-how in the productive system and for its 
efficiency. 

We have shown that social network analysis can produce useful results for program 
evaluators, since it allows to consider, in a quantitative fashion, very important as¬ 
pects that are usually ignored, due to common difficulties in quantifying them. A 
mathematical description of the structure of relations generated by a national fund¬ 
ing program is the example shown in this work. Indicators such as vertex and edge 
centralities have been used to generate a ranking between the main actors involved 
in the program, as shown in Fig. 5 and Fig. 6. We hope that the procedure and the 
results described in the present paper can help opening interesting new perspec¬ 
tives from new indicators for decision and policy makers and program evaluators, 
providing them with an useful tool. 

Many possibilities are left open by the present work. First of all, as mentioned in 
Sec. 3, around ~78% of the total budget is concentrated into ~10% of the funded 
projects. This suggests that introducing information about the financial aspect into 
the network analysis could be interesting and meaningful for the evaluator. This 
could be done in many different ways, from simple visualisation techniques in so¬ 
ciograms, like relating the size of the nodes to the total funding received by the 
actor, to more refined analysis, like defining weighted networks with weights related 
to budgets. We plan to investigate these directions in future works. Other planned 
future activities include the introduction of dynamical networks, involving the study 
of temporal series, refining of network analysis techniques, e.g. by introducing dif¬ 
ferent kinds of weighted networks and related features, and generalising the analysis 
extending it to different levels [12, 7]. Moreover, the expected improvement in qual¬ 
ity of Open Data (for example increasing the level of detail within public research 
institution, e.g. discriminating among single departments rather than universities) 
could lead to many interesting improvements of the present analysis. 
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Figure 4 Network structure of the Italian PON R&C funding program. Each vertex is a 
university, research institution or enterprise funded by the program; two vertices are 
connected if they are part of a TSA for at least one project. Only the principal giant 
component is depicted (other connected components have less than six vertices each). 

The set of nodes constituting the centre of the (giant component of the) PON R&C 
network is highlighted in red, while the main hub is depicted in orange. The centre include 
public and private research institutions, all the major Universities involved in the program, 
and also some large private enterprises. 
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Top Degree Centralities of PON R&C network 
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Figure 5 The fifteen largest values of each vertex centrality for the (giant component of 
the) PON R&C network. The highest positions are occupied by public research 
institutions. 
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Top Edge Betweenness Centralities of PON R&C network 
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Figure 6 The fifteen largest values of edge betweenness centrality for the (giant 
component of the) PON R&C network. The highest positions are occupied by links 
involving public research institutions. 
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Figure 7 The degree distribution of the network based on the PON R&C funding program 
is shown. The tail is fitted to a power-law function of the form P(k) = Ak _1 with 
A = 4.156 =b 0.375 and 7 = 1.998 zb 0.040. The vertical grey line in Fig. 7(a) shows the 
position of the cutoff (corresponding to the upper bound of the median interval) used to 
fit the tail. Part (b) shows the tail and the fitted line in Log-Log scale. 




Figure 8 The community structure of the (giant component of the) PON R&C network. 
14 communities are highlighted, found with the Newman-Girvan algorithm. 
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Figure 9 Distribution of action areas within each community. 



